Search CORE

136 research outputs found

A Bi-Criteria Algorithm for Scheduling Parallel Task Graphs on Clusters

Author: Desprez Frédéric
Suter Frédéric
Publication venue: HAL CCSD
Publication date: 17/05/2010
Field of study

International audienceApplications structured as parallel task graphs exhibit both data and task parallelism, and arise in many domains. Scheduling these applications on parallel platforms has been a long-standing challenge. In the case of a single homogeneous cluster, most of the existing algorithms focus on the reduction of the application completion time (makespan). But in presence of resource managers such as batch schedulers and due to accentuated pressure on energy concerns, the produced schedules also have to be efficient in terms of resource usage. In this paper we propose a novel bi-criteria algorithm, called biCPA, able to optimize these two performance metrics either simultaneously or separately. Using simulation over a wide range of experimental scenarios, we find that biCPA leads to better results than previously published algorithms

HAL-ENS-LYON

HAL-IN2P3

INRIA a CCSD electronic archive server

Hal-Diderot

Impact of Mixed--Parallelism on Parallel Implementations of Strassen and Winograd Matrix Multiplication Algorithms

Author: Desprez Frédéric
Suter Frédéric
Publication venue: HAL CCSD
Publication date: 01/01/2002
Field of study

In this paper we study the impact of the simultaneous exploitation of data-- and task--parallelism on Strassen and Winograd matrix multiplication algorithms. We present two mixed--parallel implementations. The former follows the phases of the original algorithms while the latter has been designed as the result of a list scheduling algorithm. We give a theoretical comparison- , in terms of memory usage and execution time, between our algorithms and classical data--parallel implementations. This analysis is corroborated by experiments. Finally we give some hints about an heterogeneous version of our algorithms

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Improving the Accuracy and Efficiency of Time-Independent Trace Replay

Author: Desprez Frédéric
Markomanolis George
Suter Frédéric
Publication venue: HAL CCSD
Publication date: 01/10/2012
Field of study

Simulation is a popular approach to obtain objective performance indicators on platforms that are not at one's disposal. It may help the dimensioning of compute clusters in large computing centers. In a previous work, we proposed a framework for the off-line simulation of MPI applications. Its main originality with regard to the literature is to rely on time-independent execution traces. This allows us to completely decouple the acquisition process from the actual replay of the traces in a simulation context. Then we are able to acquire traces for large application instances without being limited to an execution on a single compute cluster. Finally our framework is built on top of a scalable, fast, and validated simulation kernel. In this paper, we detail the performance issues that we encountered with the first implementation of our trace replay framework. We propose several modifications to address these issues and analyze their impact. Results shows a clear improvement on the accuracy and efficiency with regard to the initial implementation.La simulation est une approche populaire pour obtenir des indicateurs de performance objectifs sur des plates-formes qui ne sont pas nécessairement accessibles. Elle peut par exemple aider au dimensionnement d'infrastructures dans de grands centres de calcul. Dans un article précédent, nous avons proposé un environnement pour la simulation hors-ligne d'applications MPI. La principale originalité de cet environnement par rapport à la littérature est de ne reposer que sur des traces indépendantes du temps. Cela nous permet de découpler totalement l'acquisition des traces de leur rejeu simulé effectif. Nous sommes ainsi capables d'obtenir des traces pour de très grandes instances d'applications sans être limités à une exécution au sein d'une seule grappe de machines. Enfin, cet environnement est fondé sur un noyau de simulation extensible, rapide et validé. Dans cet article nous détaillons les problèmes de performance rencontrés par la première implantation de notre environnement de rejeu de traces. Nous proposons plusieurs modifications pour résoudre ces problèmes et analysons leur impact. Les résultats obtenus montrent une amélioration notable à la fois en termes de précision et d'efficacité par rapport à l'implantation initiale

HAL-ENS-LYON

HAL-IN2P3

INRIA a CCSD electronic archive server

Hal-Diderot

Evaluation of Profiling Tools for the Acquisition of Time Independent Traces

Author: Desprez Frédéric
Markomanolis George
Suter Frédéric
Publication venue: HAL CCSD
Publication date: 08/07/2013
Field of study

In a previous work, we proposed a framework for the off-line simulation of MPI applications. Its main originality with regard to the literature is to rely on time-independent execution traces. Time-independent traces are an original way to estimate the performance of parallel applications. To acquire time-independent traces of the execution of MPI applications, we have to instrument them to log the necessary information. There exist many profiling tools which can instrument an application. In this report we propose a scoring system that corresponds to our framework specific requirements and evaluate the most well-known and open source profiling tools according to it. Furthermore we introduce an original tool called Minimal Instrumentation that was designed to fulfill the requirements of our framework.Dans nos précédents travaux, nous avons proposé un environnement pour la simulation hors-ligne d'applications MPI. Sa principale originalité vis-à-vis de la littérature est de s'appuyer sur des traces d'exécution indépendantes du temps. Cela constitue une manière originale d'estimer les performances d'applications parallèles. Pour acquérir de telles traces indépendantes du temps lors de l'exécution d'applications MPI, nous devns les instrumenter afin de recueillir toutes les informations nécessaires. Il existe de nombreux outils de profiling permettant d'instrumenter une application. Dans ce rapport, nous proposons une méthode de notation correspondant aux besoins spécifiques de notre environnement et évaluons les outils de profiling open-source les plus connus selon cette méthode. De plus, nous introduisons un outil original, appelé Minimal Instrumentation, spécialement conçu pour répondre aux besoins de notre environnement

HAL-ENS-LYON

HAL-IN2P3

INRIA a CCSD electronic archive server

Hal-Diderot

Dynamic Performance Forecasting for Network-Enabled Servers in a Heterogeneous Environment

Author: Desprez Frédéric
Quinson Martin
Suter Frédéric
Publication venue: HAL CCSD
Publication date: 01/01/2001
Field of study

This paper presents a tool for dynamic forecasting of Network-Enabled Servers performance. FAST (Fast Agent's System Timer}) is a software package allowing client applications to get an accurate forecast of communicat- ion and computation times and memory use in a heterogeneous environment. It relies on low level software packages, i.e., network and host monitoring tools, and some of our developments in computation routines modeling. The FAST internals and user interface are presented and a comparison between the execution time predicted by FAST and the measured time of complex matrix multiplication executed on an heterogeneous platform is given

HAL-ENS-LYON

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

One-Step Algorithm for Mixed Data and Task Parallel Scheduling Without Data Replication

Author: Boudet Vincent
Desprez Frédéric
Suter Frédéric
Publication venue: HAL CCSD
Publication date: 01/01/2002
Field of study

International audienceIn this paper we propose an original algorithm for mixed data and task parallel scheduling. The main specificities of this algorithm are to simultaneously perform the allocation and scheduling processes, and avoid the data replication. The idea is to base the scheduling on an accurate evaluation of each task of the application depending on the processor grid. Then no assumption is made with regard to the homogeneity of the execution platform. The complexity of our algorithm are given. Performance achieved by our schedules both in homogeneous and heterogeneous worlds, are compared to data-parallel executions for two applications: the complex matrix multiplication and the Strassen decomposition

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Time-Independent Trace Acquisition Framework -- A Grid'5000 How-to

Author: Markomanolis George
Suter Frédéric
Publication venue: HAL CCSD
Publication date: 01/05/2011
Field of study

GRID5000This manual describes step-by-step how to create a Grid'5000 appliance that comprises all the tools needed to acquire time-independent traces of the execution of an MPI application. Time-independent traces are an original way to estimate the performance of parallel applications. It allows to totally decouple the acquisition of a trace from its replay in a simulation framework. This manual also details the different acquisition scenarios allowed by this approach. Traces can be acquired in a very classical way, by folding the execution on less resources, or by scattering the execution across multiple clusters.Ce manuel décrit pas à pas la création d'une image système pour Griud'5000 comprenant tous les outils nécessaires à l'acquisition de traces de l'exécution d'une application MPI qui sont indépendantes du temps. L'utilisation de telles traces est une approche originale pour estimer les performances d'applications parallèles. Cela permet de découpler entièrement l'acquisition d'une trace de son rejeu dans un environnement de simulation. Ce manuel décrit également les différents scénarios d'acquisition rendus possibles par cette approche. Les traces peuvent être obtenues de façon classique, en repliant l'exécution sur moins de ressources, ou encore en répartissant l'exécution sur plusieurs grappes de machines

HAL-ENS-LYON

HAL-IN2P3

INRIA a CCSD electronic archive server

Hal-Diderot

SimGrid: a Sustained Effort for the Versatile Simulation of Large Scale Distributed Systems

Author: Casanova Henri
Giersch Arnaud
Legrand Arnaud
Quinson Martin
Suter Frédéric
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we present Simgrid, a toolkit for the versatile simulation of large scale distributed systems, whose development effort has been sustained for the last fifteen years. Over this time period SimGrid has evolved from a one-laboratory project in the U.S. into a scientific instrument developed by an international collaboration. The keys to making this evolution possible have been securing of funding, improving the quality of the software, and increasing the user base. In this paper we describe how we have been able to make advances on all three fronts, on which we plan to intensify our efforts over the upcoming years.Comment: 4 pages, submission to WSSSPE'1

arXiv.org e-Print Archive

HAL-ENS-LYON

CiteSeerX

HAL-IN2P3

HAL - Université de Franche-Comté

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

Assessing the Performance of MPI Applications Through Time-Independent Trace Replay

Author: Desprez Frédéric
Markomanolis George
Quinson Martin
Suter Frédéric
Publication venue: HAL CCSD
Publication date: 01/01/2010
Field of study

International audienceSimulation is a popular approach to obtain objective performance indicators platforms that are not at one's disposal. It may help the dimensioning of compute clusters in large computing centers. In this work we present a framework for the off-line simulation of MPI applications. Its main originality with regard to the literature is to rely on time-independent execution traces. This allows us to completely decouple the acquisition process from the actual replay of the traces in a simulation context. Then we are able to acquire traces for large application instances without being limited to an execution on a single compute cluster. Finally our framework is built on top of a scalable, fast, and validated simulation kernel. In this paper, we present the used time-independent trace format, investigate several acquisition strategies, detail the developed trace replay tool, and assess the quality of our simulation framework in terms of accuracy, acquisition time, simulation time, and trace size.La simulation est une approche très populaire pour obtenir des indicateurs de performances objectifs sur des plates-formes qui ne sont pas disponibles. Cela peut permettre le dimensionnement de grappes de calculs au sein de grands centres de calcul. Dans cet article nous présentons un outil de simulation post-mortem d'applications MPI. Sa principale originalité au regard de la littérature est d'utiliser des traces d'exécution indépendantes du temps. Cela permet de découpler intégralement le processus d'acquisition des traces de celui de rejeu dans un contexte de simulation. Il est ainsi possible d'obtenir des traces pour de grandes instances de problèmes sans être limité à des exécutions au sein d'une unique grappe. Enfin notre outil est développé au dessus d'un noyau de simulation scalable, rapide et validé. Cet article présente le format de traces indépendantes du temps utilisé, étudie plusieurs stratégies d'acquisition, détaille l'outil de rejeu que nous avons dévelopé, et evalué la qualité de nos simulations en termes de précision, temps d'acuisition, temps de simulation et tailles de traces

HAL-ENS-LYON

CiteSeerX

HAL-IN2P3

INRIA a CCSD electronic archive server

Hal-Diderot

Budget Constrained Resource Allocation for Non-Deterministic Workflows on a IaaS Cloud

Author: Caron Eddy
Desprez Frédéric
Muresan Adrian
Suter Frédéric
Publication venue: HAL CCSD
Publication date: 14/05/2012
Field of study

Many scientific applications are described through workflow structures. Due to the increasing level of parallelism offered by modern computing infrastructures, workflow applications now have to be composed not only of sequential programs, but also of parallel ones. Cloud platforms bring on-demand resource provisioning and pay-as-you-go payment charging. Then the execution of a workflow corresponds to a certain budget. The current work addresses the problem of resource allocation for non-deterministic workflows under budget constraints. We present a way of transforming the initial problem into sub-problems that have been studied before. We propose two new allocation algorithms that are capable of determining resource allocations under budget constraints and we present ways of using them to address the problem at hand.De nombreuses applications scientifiques sont décrites sous la forme de workflows. Du fait de l'accroissement du niveau de parallélisme offert par les infrastructures de calcul modernes, de telles applications doivent désormais être composées non seulement de programmes séquentiels mais aussi de programmes parallèles. Les Clouds offrent le provisionnement de ressources à la demande ainsi qu'une facturation à l'utilisation. L'exécution d'un workflow correspond alors à un certain budget. Dans cet article, nous considérons le problème de l'allocation de ressources à un workflow non déterministe en présence de contraintes de budget. Nous présentons une façon de transformer le problème initial en une série de sous-problèmes qui ont été largement étudiés. Nous proposons deux algorithmes originaux qui peuvent déterminer des allocations de ressources sous contrainte de budget. Nous détaillons également comment les utiliser pour résoudre le problème initial

HAL-ENS-LYON

HAL-IN2P3

INRIA a CCSD electronic archive server

Hal-Diderot